122 research outputs found

    Brook Auto: High-Level Certification-Friendly Programming for GPU-powered Automotive Systems

    Get PDF
    Modern automotive systems require increased performance to implement Advanced Driving Assistance Systems (ADAS). GPU-powered platforms are promising candidates for such computational tasks, however current low-level programming models challenge the accelerator software certification process, while they limit the hardware selection to a fraction of the available platforms. In this paper we present Brook Auto, a high-level programming language for automotive GPU systems which removes these limitations. We describe the challenges and solutions we faced in its implementation, as well as a complete evaluation in terms of performance and productivity, which shows the effectiveness of our method.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    Towards general purpose computations on low-end mobile GPUs

    Get PDF
    GPUs traditionally offer high computational capabilities, frequently higher than their CPU counterparts. While high-end mobile GPUs vendors introduced recently general purpose APIs, such as OpenCL, to leverage their computational power, the vast majority of the mobile devices lack such support. Despite that their graphics APIs have similarities with desktop graphics APIs, they have significant differences, which prevent the use of well-known techniques that offer general-purpose computations over such interfaces. In this paper we show how these obstacles can be overcome, in order to achieve general purpose programmability of these devices. As a proof of concept we implemented our proposal on a real embedded platform (Raspberry Pi) based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2Ă— over the CPU.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Leonidas Kosmidis is also funded by the Spanish Ministry of Education under the FPU grant AP2010-4208.Postprint (author's final draft

    DO-178C certification of general-purpose GPU software: review of existing methods and future directions

    Get PDF
    —General-Purpose GPU software is considered for use in avionics to satisfy the increased computational requirements of future systems. Therefore, it needs to be certified following the DO-178C guidance as all airborne software. In this work, we review the existing methods in the literature, we analyse their advantages and disadvantages, and we discuss how they can be combined to obtain certification with lower effort and cost. Our focus is restricted on application-level software, under the premise that successful completion of verification of avionics graphics GPU software products has been demonstrated, so their GPU compiler has been considered acceptable for these already DO-178C certified products, or existing qualified GPU compilers exist. Finally, we discuss upcoming solutions for certified general purpose GPU computing .This work was performed within the Airbus TANIAGPU Project ADS (E/200) in collaboration with the project partners Airbus Defence and Space, Madrid, Spain and CoreAVI, Canada. It was also partially supported by the European Space Agency (ESA) through the GPU4S (GPU for Space) activity, the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB and FJCI-2017-34095 (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033) and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    Compiler support for an AI-oriented SIMD extension of a space processor

    Get PDF
    In this on going research paper, we present our work on the compiler support for an AI-oriented SIMD Extension, called SPARROW. The SPARROW hardware design has been developed during a recently defended, awardwinning Master Thesis and is targeting Cobham Gaisler's space processors Leon3 and NOEL-V. We present the compiler support we have included in two compiler toolchains, gcc and llvm as well as a SIMD intrinsics library for easy programmability. Compiler modifications are kept to minimum in order to enable incremental qualification of the toolchains. We present our experience working with the two compilers and performance results for the two compilers on top an FPGA implementation of the target space processor.This work was funded by the Ministerio de Ciencia e Innovacion - Agencia Estatal de Investigacion (PID2019-107255GB-C21/AEI/10.13039/501100011033 and IJC-2020-045931-I) and partially supported by the European Space Agency (ESA) through the GPU4S (GPU for Space) activity and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    Enabling caches in probabilistic timing analysis

    Get PDF
    Hardware and software complexity of future critical real-time systems challenges the scalability of traditional timing analysis methods. Measurement-Based Probabilistic Timing Analysis (MBPTA) has recently emerged as an industrially-viable alternative technique to deal with complex hardware/software. Yet, MBPTA requires certain timing properties in the system under analysis that are not satisfied in conventional systems. In this thesis, we introduce, for the first time, hardware and software solutions to satisfy those requirements as well as to improve MBPTA applicability. We focus on one of the hardware resources with highest impact on both average performance and Worst-Case Execution Time (WCET) in current real-time platforms, the cache. In this line, the contributions of this thesis follow three different axes: hardware solutions and software solutions to enable MBPTA, and MBPTA analysis enhancements in systems featuring caches. At hardware level, we set the foundations of MBPTA-compliant processor designs, and define efficient time-randomised cache designs for single- and multi-level hierarchies of arbitrary complexity, including unified caches, which can be time-analysed for the first time. We propose three new software randomisation approaches (one dynamic and two static variants) to control, in an MBPTA-compliant manner, the cache jitter in Commercial off-the-shelf (COTS) processors in real-time systems. To that end, all variants randomly vary the location of programs' code and data in memory across runs, to achieve probabilistic timing properties similar to those achieved with customised hardware cache designs. We propose a novel method to estimate the WCET of a program using MBPTA, without requiring the end-user to identify worst-case paths and inputs, improving its applicability in industry. We also introduce Probabilistic Timing Composability, which allows Integrated Systems to reduce their WCET in the presence of time-randomised caches. With the above contributions, this thesis pushes the limits in the use of complex real-time embedded processor designs equipped with caches and paves the way towards the industrialisation of MBPTA technology.La complejidad de hardware y software de los sistemas críticos del futuro desafía la escalabilidad de los métodos tradicionales de análisis temporal. El análisis temporal probabilístico basado en medidas (MBPTA) ha aparecido últimamente como una solución viable alternativa para la industria, para manejar hardware/software complejo. Sin embargo, MBPTA requiere ciertas propiedades de tiempo en el sistema bajo análisis que no satisfacen los sistemas convencionales. En esta tesis introducimos, por primera vez, soluciones hardware y software para satisfacer estos requisitos como también mejorar la aplicabilidad de MBPTA. Nos centramos en uno de los recursos hardware con el máximo impacto en el rendimiento medio y el peor caso del tiempo de ejecución (WCET) en plataformas actuales de tiempo real, la cache. En esta línea, las contribuciones de esta tesis siguen 3 ejes distintos: soluciones hardware y soluciones software para habilitar MBPTA, y mejoras de el análisis MBPTA en sistemas usado caches. A nivel de hardware, creamos las bases del diseño de un procesador compatible con MBPTA, y definimos diseños de cache con tiempo aleatorio para jerarquías de memoria con uno y múltiples niveles de cualquier complejidad, incluso caches unificadas, las cuales pueden ser analizadas temporalmente por primera vez. Proponemos tres nuevos enfoques de aleatorización de software (uno dinámico y dos variedades estáticas) para manejar, en una manera compatible con MBPTA, la variabilidad del tiempo (jitter) de la cache en procesadores comerciales comunes en el mercado (COTS) en sistemas de tiempo real. Por eso, todas nuestras propuestas varían aleatoriamente la posición del código y de los datos del programa en la memoria entre ejecuciones del mismo, para conseguir propiedades de tiempo aleatorias, similares a las logradas con diseños hardware personalizados. Proponemos un nuevo método para estimar el WCET de un programa usando MBPTA, sin requerir que el usuario dentifique los caminos y las entradas de programa del peor caso, mejorando así la aplicabilidad de MBPTA en la industria. Además, introducimos la composabilidad de tiempo probabilística, que permite a los sistemas integrados reducir su WCET cuando usan caches de tiempo aleatorio. Con estas contribuciones, esta tesis empuja los limites en el uso de diseños complejos de procesadores empotrados en sistemas de tiempo real equipados con caches y prepara el terreno para la industrialización de la tecnología MBPTA

    SPARROW: A low-cost hardware/software co-designed SIMD microarchitecture for AI operations in space processors

    Get PDF
    Recently there is an increasing interest in the use of artificial intelligence for on-board processing as indicated by the latest space missions, which cannot be satisfied by existing low-performance space-qualified processors. Although COTS AI accelerators can provide the required performance, they are not designed to meet space requirements. In this work, we co-design a low-cost SIMD micro-architecture integrated in a space qualified processor, which can significantly increase its performance. Our solution has no impact on the processor's 100 MHz frequency and consumes minimal area thanks to its innovative design compared to conventional vector micro-architectures. For the minimum configuration of our baseline space processor, our results indicate a performance boost of up to 9.3× for commonly used AI-related and image processing algorithms and 5.5× faster for a complex, space-relevant inference application with just 30% area increase.This work was supported by ESA through the GPU4S (GPU for Space) project, the Spanish Ministry of Economy and Competitiveness under grants PID2019- 107255GB and FJCI-2017-34095 (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465), the HiPEAC Network of Excellence and a first prize in Xilinx’s University Open Hardware Competition 2021 in the student category.Peer ReviewedPostprint (author's final draft

    High-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V

    Get PDF
    As software continues to control more system-critical functions in cars, its timing is becoming an integral element in functional safety. Timing validation and verification (V&V) assesses softwares end-to-end timing measurements against given budgets. The advent of multicore processors with massive resource sharing reduces the significance of end-to-end execution times for timing V&V and requires reasoning on (worst-case) access delays on contention-prone hardware resources. While Performance Monitoring Units (PMU) support this finer-grained reasoning, their design has never been a prime consideration in high-performance processors - where automotive-chips PMU implementations descend from - since PMU does not directly affect performance or reliability. To meet PMUs instrumental importance for timing V&V, we advocate for PMUs in automotive chips that explicitly track activities related to worst-case (rather than average) softwares behavior, are recognized as an ISO-26262 mandatory high-integrity hardware service, and are accompanied with detailed documentation that enables their effective use to derive reliable timing estimatesThis work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzet has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-IncorporaciĂłn postdoctoral fellowship number IJCI-2016- 27396.Peer ReviewedPostprint (author's final draft

    A confidence assessment of WCET estimates for software time randomized caches

    Get PDF
    Obtaining Worst-Case Execution Time (WCET) estimates is a required step in real-time embedded systems during software verification. Measurement-Based Probabilistic Timing Analysis (MBPTA) aims at obtaining WCET estimates for industrial-size software running upon hardware platforms comprising high-performance features. MBPTA relies on the randomization of timing behavior (functional behavior is left unchanged) of hard-to-predict events like the location of objects in memory — and hence their associated cache behavior — that significantly impact software's WCET estimates. Software time-randomized caches (sTRc) have been recently proposed to enable MBPTA on top of Commercial off-the-shelf (COTS) caches (e.g. modulo placement). However, some random events may challenge MBPTA reliability on top of sTRc. In this paper, for sTRc and programs with homogeneously accessed addresses, we determine whether the number of observations taken at analysis, as part of the normal MBPTA application process, captures the cache events significantly impacting execution time and WCET. If this is not the case, our techniques provide the user with the number of extra runs to perform to guarantee that cache events are captured for a reliable application of MBPTA. Our techniques are evaluated with synthetic benchmarks and an avionics application.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316, the HiPEAC Network of Excellence, and COST Action IC1202: Timing Analysis On Code-Level (TACLe). Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    High-Integrity GPU Designs for Critical Real-Time Automotive Systems

    Get PDF
    Autonomous Driving (AD) imposes the use of highperformance hardware, such as GPUs, to perform object recognition and tracking in real-time. However, differently to the consumer electronics market, critical real-time AD functionalities require a high degree of resilience against faults, in line with the automotive ISO26262 functional safety standard requirements. ISO26262 imposes the use of some source of independent redundancy for the most critical functionalities so that a single fault cannot lead to a failure, being dual core lockstep (DCLS) with diversity the preferred choice for computing devices. Unfortunately, GPUs do not support diverse DCLS by construction, thus failing to meet ISO26262 requirements efficiently. In this paper we propose lightweight modifications to GPUs to enable diverse DCLS for critical real-time applications without diminishing their performance for non-critical applications. In particular, we show how enabling specific mechanisms for software-controlled kernel scheduling in the GPU, allows guaranteeing that redundant kernels can be executed in different resources so that a single fault cannot lead to a failure, as imposed by ISO26262. Our results on a GPU simulator and an NVIDIA GPU prove the viability of the approach and its effectiveness on high-performance GPU designs needed for AD systems.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Carles Hernandez is jointly funded by the MINECO and FEDER funds through grant TIN2014-60404-JIN.SĂ­Postprint (author's final draft

    Space Shuttle: A test vehicle for the reliability of the SkyWater 130nm PDK for future space processors

    Get PDF
    Recently the ASIC industry experiences a massive change with more and more small and medium businesses entering the custom ASIC development. This trend is fueled by the recent open hardware movement and relevant government and privately funded initiatives. These new developments can open new opportunities in the space sector – which is traditionally characterised by very low volumes and very high non-recurrent (NRE) costs – if we can show that the produced chips have favourable radiation properties. In this paper, we describe the design and tape-out of Space Shuttle, the first test chip for the evaluation of the suitability of the SkyWater 130nm PDK and the OpenLane EDA toolchain using the Google/E-fabless shuttle run for future space processors.This work was supported by ESA through the 4000136514/21/NL/GLC/my co-funded PhD activity ”Mixed Software/Hardware-based Fault-tolerance Techniques for Complex COTS System-on-Chip in Radiation Environments”. Moreover, it was partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB-C21 and IJC2020-045931-I (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033) and the European Community’s Horizon Europe programme under the METASAT project (grant agreement 101082622).Peer ReviewedPostprint (author's final draft
    • …
    corecore